nyc_medicare_2017 <- medicare_part_b_2017 %>%
  filter(`Provider Zip Code` %in% nyc_zip_codes) %>%
  mutate(zip_3 = substr(`Provider Zip Code`, 1, 3),
         borough = ifelse(zip_3 == 104, "Bronx", ifelse(zip_3 == 112, "Brooklyn", 
                                                        ifelse(zip_3 == 100, "Manhattan", 
                                                               ifelse(zip_3 == 103, "Staten Island", "Queens")))),
         health_plus = ifelse(`Provider Name` %in% nyc_health_plus, "Y", "N"))
nyc_hospitals <- nyc_medicare_2017$`Provider Name` %>%
  unique
nyc_hospitals_borough <- nyc_medicare_2017 %>%
  group_by(borough) %>%
  summarise(number_hospitals = n_distinct(`Provider Name`))
datatable(nyc_hospitals_borough)

Okay so if we look at this nyc_hospitals_borough, we clearly see that there is a geographical imbalance in terms of the number of hospitals per borough. Queens is probably the most underserved borough in this regard since it is also the biggest borough by land area, right?

We also have another problem. How do you determine if a procedure is the most common? Do you sum up the number of procedures in a given year, and then rank? Or do you find the most common procedure at each hospital?

procedures_count <- nyc_medicare_2017 %>%
  group_by(`DRG Definition`) %>%
  summarise(discharge_count = sum(`Total Discharges`)) %>%
  arrange(desc(discharge_count))
head(procedures_count, n = 10)
## # A tibble: 10 x 2
##    `DRG Definition`                                              discharge_count
##    <chr>                                                                   <dbl>
##  1 871 - SEPTICEMIA OR SEVERE SEPSIS W/O MV >96 HOURS W MCC                10278
##  2 470 - MAJOR JOINT REPLACEMENT OR REATTACHMENT OF LOWER EXTRE~            8317
##  3 291 - HEART FAILURE & SHOCK W MCC                                        5879
##  4 392 - ESOPHAGITIS, GASTROENT & MISC DIGEST DISORDERS W/O MCC             2828
##  5 312 - SYNCOPE & COLLAPSE                                                 2613
##  6 690 - KIDNEY & URINARY TRACT INFECTIONS W/O MCC                          2482
##  7 872 - SEPTICEMIA OR SEVERE SEPSIS W/O MV >96 HOURS W/O MCC               2324
##  8 190 - CHRONIC OBSTRUCTIVE PULMONARY DISEASE W MCC                        2003
##  9 683 - RENAL FAILURE W CC                                                 1874
## 10 378 - G.I. HEMORRHAGE W CC                                               1830

So, if we look at the 10 most common procedures by the total number of discharges, septicemia/sepsis treatments are leading the pack.

nyc_medicare_2017_ten_common <- nyc_medicare_2017 %>%
  filter(`DRG Definition` %in% unlist(procedures_count$`DRG Definition`)[1:10]) %>%
  select(`Provider Name`, `DRG Definition`, `Total Discharges`, `Average Covered Charges`,`Average Total Payments`, `Average Medicare Payments` ,borough, health_plus) %>%
  mutate(DRG_num = as.character(substr(`DRG Definition`, 1, 3)))
cost_boxplot <- ggplot(nyc_medicare_2017_ten_common, aes(DRG_num, `Average Total Payments`)) + 
  geom_boxplot() + 
  labs(title = "Box Plot of Average Total Payments by DRG",
       subtitle = "Each observation is the averal total payments attributed to each hospital",
       x = "DRG Number",
       y = "Average Total Payments")
cost_boxplot

Okay, so there is a huge variance between hospitals for each of the ten most common procedures. Are there hospitals that are consistently above or below average?

nyc_medicare_2017_ten_common__total_payments_zscores <- nyc_medicare_2017_ten_common %>%
  group_by(DRG_num) %>%
  mutate(mean_total_payment = mean(`Average Total Payments`),
         sd_total_payment = sd(`Average Total Payments`)) %>%
  ungroup() %>%
  mutate(z_score = (`Average Total Payments`-mean_total_payment)/sd_total_payment)
hospital_total_payments_scatter <- ggplot(nyc_medicare_2017_ten_common__total_payments_zscores, aes(`Provider Name`, z_score, label = DRG_num)) + 
  geom_point(aes(col = health_plus)) + 
  geom_hline(yintercept = 0) +
  coord_flip() + 
  labs(title = "Average Total Payments Scatterplot",
       subtitle = "No. of std dev for each common DRG by hospital",
       y = "Standard Deviations",
       x = "Hospital")
ggplotly(hospital_total_payments_scatter)

Scandal! All NYC Health-Plus, ie public hospitals, charge more than private ones, with the exception of Coney Island Hospital. More precisely, on average, NYC Health+ hospitals are paid above-average amounts for the procedures performed. Well, I assume that Medicare does not just cover every single part of the procedure. Some parts of the entire treatment process might not be billable to Medicare. So what if we look at the Average Covered Charges?

nyc_medicare_2017_ten_common_covered_charges_zscores <- nyc_medicare_2017_ten_common %>%
  group_by(DRG_num) %>%
  mutate(mean_covered_charges = mean(`Average Covered Charges`),
         sd_covered_charges = sd(`Average Covered Charges`)) %>%
  ungroup() %>%
  mutate(z_score = (`Average Covered Charges`-mean_covered_charges)/sd_covered_charges)
hospital_covered_charges_scatter <- ggplot(nyc_medicare_2017_ten_common_covered_charges_zscores, aes(`Provider Name`, z_score, label = DRG_num)) + 
  geom_point(aes(col = health_plus)) + 
  geom_hline(yintercept = 0) +
  coord_flip() + 
  labs(title = "Average Covered Charges Scatterplot",
       subtitle = "No. of std dev for each common DRG by hospital",
       y = "Standard Deviations",
       x = "Hospital")
ggplotly(hospital_covered_charges_scatter)

Evidently, this picture is a bit more mixed. So the Average Covered Charges of NYC Health+ hospitals tend to be below average whereas private that of private hospital systems tend to be above average. But if you think about it, what we are really interested in is, how much do beneficiaries (and their insurance providers) have to pay on top of what Medicare pays out. In other words, we really want to look into the co-payments, deductibles, and other additional payments from third parties for coordination of benefits.
Non Medicare Payments = Average Total Payments - Average Medicare Payments.

nyc_medicare_2017_ten_common_non_medicare_payments_zscores <- nyc_medicare_2017_ten_common %>%
  group_by(DRG_num) %>%
  mutate(average_non_medicare_payments = `Average Total Payments` - `Average Medicare Payments`,
         mean_non_medicare_charges = mean(average_non_medicare_payments),
         sd_non_medicare_charges = sd(average_non_medicare_payments)) %>%
  ungroup() %>%
  mutate(z_score = (average_non_medicare_payments-mean_non_medicare_charges)/sd_non_medicare_charges)
hospital_non_medicare_payments_scatter <- ggplot(nyc_medicare_2017_ten_common_non_medicare_payments_zscores, aes(`Provider Name`, z_score, label = DRG_num)) + 
  geom_point(aes(col = health_plus)) + 
  geom_hline(yintercept = 0) +
  coord_flip() + 
  labs(title = "Average Covered Charges Scatterplot",
       subtitle = "No. of std dev for each common DRG by hospital",
       y = "Standard Deviations",
       x = "Hospital")
ggplotly(hospital_non_medicare_payments_scatter)

Okay, so this presents a much more nuanced picture. Even though Total Payments might be high at NYC Health+ hospitals, the amount that not covered by Medicare tends to be below average at NYC Health+ hospitals.